Goto

Collaborating Authors

 correlation detection


Model Correlation Detection via Random Selection Probing

Chen, Ruibo, Zhang, Sheng, Wu, Yihan, Zheng, Tong, Mai, Peihua, Huang, Heng

arXiv.org Artificial Intelligence

The growing prevalence of large language models (LLMs) and vision-language models (VLMs) has heightened the need for reliable techniques to determine whether a model has been fine-tuned from or is even identical to another. Existing similarity-based methods often require access to model parameters or produce heuristic scores without principled thresholds, limiting their applicability. We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test. RSP optimizes textual or visual prefixes on a reference model for a random selection task and evaluates their transferability to a target model, producing rigorous p-values that quantify evidence of correlation. To mitigate false positives, RSP incorporates an unrelated baseline model to filter out generic, transferable features. We evaluate RSP across both LLMs and VLMs under diverse access conditions for reference models and test models. Experiments on fine-tuned and open-source models show that RSP consistently yields small p-values for related models while maintaining high p-values for unrelated ones. Extensive ablation studies further demonstrate the robustness of RSP. These results establish RSP as the first principled and general statistical framework for model correlation detection, enabling transparent and interpretable decisions in modern machine learning ecosystems.


The graph alignment problem: fundamental limits and efficient algorithms

Ganassali, Luca

arXiv.org Machine Learning

Similarly to many other inference problems in planted models, we are interested in understanding the fundamental information-theoretical limits as well as the computational hardness of graph alignment. First, we study the Gaussian setting, when the graphs are complete and the signal lies on correlated Gaussian edges weights. We prove that the exact recovery task exhibits a sharp information-theoretic threshold (and characterize it), and study a simple and natural spectral method for recovery, EIG1, which consists in aligning the leading eigenvectors of the adjacency matrices of the two graphs. While most of the recent work on the subject was dedicated to recovering the hidden signal in dense graphs, we next explore graph alignment in the sparse regime, where the mean degree of the nodes are constant, not scaling with the graph size. In this particularly challenging setting, for sparse Erdős-Rényi graphs, only a fraction of the nodes can be correctly matched by any algorithm.


Statistical limits of correlation detection in trees

Ganassali, Luca, Massoulié, Laurent, Semerjian, Guilhem

arXiv.org Machine Learning

In this paper we address the problem of testing whether two observed trees $(t,t')$ are sampled either independently or from a joint distribution under which they are correlated. This problem, which we refer to as correlation detection in trees, plays a key role in the study of graph alignment for two correlated random graphs. Motivated by graph alignment, we investigate the conditions of existence of one-sided tests, i.e. tests which have vanishing type I error and non-vanishing power in the limit of large tree depth. For the correlated Galton-Watson model with Poisson offspring of mean $\lambda>0$ and correlation parameter $s \in (0,1)$, we identify a phase transition in the limit of large degrees at $s = \sqrt{\alpha}$, where $\alpha \sim 0.3383$ is Otter's constant. Namely, we prove that no such test exists for $s \leq \sqrt{\alpha}$, and that such a test exists whenever $s > \sqrt{\alpha}$, for $\lambda$ large enough. This result sheds new light on the graph alignment problem in the sparse regime (with $O(1)$ average node degrees) and on the performance of the MPAlign method studied in Ganassali et al. (2021), Piccioli et al. (2021), proving in particular the conjecture of Piccioli et al. (2021) that MPAlign succeeds in the partial recovery task for correlation parameter $s>\sqrt{\alpha}$ provided the average node degree $\lambda$ is large enough. As a byproduct, we identify a new family of orthogonal polynomials for the Poisson-Galton-Watson measure which enjoy remarkable properties. These polynomials may be of independent interest for a variety of problems involving graphs, trees or branching processes, beyond the scope of graph alignment.


On Correlation Detection and Alignment Recovery of Gaussian Databases

Tamir, Ran

arXiv.org Artificial Intelligence

Two fundamental problems in the statistical analysis of Gaussian databases are correlation detection (or testing) and alignment recovery. Correlation detection of two databases is basically a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, there exists a permutation, for which the databases are correlated. In this task, the main objective is to attain the best trade-off between the type-I and type-II error probabilities. In the problem of database alignment recovery, we make an assumption that the two databases are correlated, and want to estimate the underlying permutation. The objective is to minimize the probability of alignment error. While alignment recovery of databases with n sequences, each containing d Gaussian entries has been recently studied in [1] and correlation detection of such Gaussian databases has been lately explored in [2], it seems very natural to tackle the two individual problems together as a joint problem of correlation detection and alignment recovery. In addition, we also refer to the problem of partial alignment recovery, in which one would like to estimate only part of the underlying alignment between the databases. The main reasons for preferring partial alignment recovery instead of full alignment recovery are as follows.